Skip to content

[None][fix] fix PEFT page accumulation in MaxUtilizationPolicy scheduler#13528

Open
achartier wants to merge 2 commits intoNVIDIA:mainfrom
achartier:fix/peft-page-accounting
Open

[None][fix] fix PEFT page accumulation in MaxUtilizationPolicy scheduler#13528
achartier wants to merge 2 commits intoNVIDIA:mainfrom
achartier:fix/peft-page-accounting

Conversation

@achartier
Copy link
Copy Markdown
Collaborator

@achartier achartier commented Apr 27, 2026

Summary by CodeRabbit

  • Bug Fixes

    • Fixed PEFT page counter tracking in the scheduler to properly maintain cumulative page usage across multiple requests, preventing resource accounting inconsistencies.
  • Tests

    • Added comprehensive test to validate that PEFT page limits are correctly enforced when scheduling multiple LoRA/PEFT requests with distinct task identifiers.

Description

In MaxUtilizationPolicy._try_scheduling_request, num_scheduled_peft_pages was passed by value and never returned, so the caller's accumulator stayed at 0 across all requests. This caused the per-batch PEFT page cap to be checked against 0 already-scheduled pages for every request, potentially over-admitting LoRA tasks beyond the device cache limit.

Fix mirrors the C++ reference (in capacityScheduler.cpp) return the updated count from _try_scheduling_request and accumulate it in the caller.

Add a unit test that fails before the fix (3 tasks admitted when only 2 fit within max_pages=25) and passes after.

Test Coverage

Added new test

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

In MaxUtilizationPolicy._try_scheduling_request, num_scheduled_peft_pages
was passed by value and never returned, so the caller's accumulator stayed
at 0 across all requests. This caused the per-batch PEFT page cap to be
checked against 0 already-scheduled pages for every request, potentially
over-admitting LoRA tasks beyond the device cache limit.

Fix mirrors the C++ reference (in capacityScheduler.cpp) return the
updated count from _try_scheduling_request and accumulate it in the caller.

Add a unit test that fails before the fix (3 tasks admitted when only 2
fit within max_pages=25) and passes after.

Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
@achartier achartier requested a review from a team as a code owner April 27, 2026 21:25
@achartier achartier requested a review from byshiue April 27, 2026 21:25
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 27, 2026

📝 Walkthrough

Walkthrough

The changes modify the scheduler's PEFT page tracking mechanism in MaxUtilizationPolicy._try_scheduling_request to return both scheduling status and accumulated PEFT page count, ensuring consistent page accounting across scheduling iterations. A new unit test validates cumulative PEFT page usage enforcement across multiple LoRA/PEFT requests.

Changes

Cohort / File(s) Summary
Scheduler PEFT page tracking
tensorrt_llm/_torch/pyexecutor/scheduler/scheduler.py
Modified _try_scheduling_request return type from bool to (bool, int) tuple, preserving PEFT page counter on early exits and returning cumulative scheduled pages alongside scheduling status.
Scheduler tests
tests/unittest/_torch/executor/test_py_scheduler.py
Added new test case validating cumulative PEFT page usage across multiple requests, verifying that global PEFT page limits are enforced when scheduling multiple LoRA/PEFT requests with distinct task identifiers.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly identifies the main change: fixing PEFT page accumulation in the MaxUtilizationPolicy scheduler, which directly matches the core issue described in the PR.
Description check ✅ Passed The description clearly explains the bug (num_scheduled_peft_pages passed by value), its impact (over-admission beyond cache limits), the fix (return updated count and accumulate), and test coverage (unit test demonstrating the fix). All required template sections are addressed.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/unittest/_torch/executor/test_py_scheduler.py (1)

2076-2100: Regression coverage is solid, and QA integration list updates are unnecessary for this unit-test-only change.

This test cleanly reproduces the accumulation bug and verifies the intended cap behavior under MAX_UTILIZATION.
As per coding guidelines, if a PR only touches unittest or narrow unit scope, QA integration test-list updates are unnecessary or optional.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unittest/_torch/executor/test_py_scheduler.py` around lines 2076 -
2100, The test test_max_utilization_peft_page_accumulation demonstrates that
PyCapacityScheduler under CapacitySchedulerPolicy.MAX_UTILIZATION must
accumulate PEFT page usage across admitted requests; ensure
MockPeftCacheManager(max_pages=25, pages_per_request=10) behavior is used by
PyCapacityScheduler.schedule_request so that cumulative pages from admitted
tasks are tracked and the third request is rejected. Locate the test
test_max_utilization_peft_page_accumulation and the scheduling logic in
PyCapacityScheduler.schedule_request and confirm the scheduler queries
peft_cache_manager (MockPeftCacheManager) for current allocated pages and adds
the candidate request's pages_per_request before admitting; if accumulation is
missing, update schedule_request to sum existing allocated PEFT pages plus the
new request's pages and enforce max_pages when deciding to admit or reject.
Ensure the test asserts len(fitting) == 2 still passes after the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unittest/_torch/executor/test_py_scheduler.py`:
- Line 2097: The test unpacks three values from scheduler.schedule_request into
fitting, disagg, paused but only fitting is used; rename the unused unpacked
variables to underscore-prefixed names (e.g., _disagg and _paused) to satisfy
RUF059. Update the unpacking at the call to scheduler.schedule_request([r0, r1,
r2]) so it assigns fitting, _disagg, _paused and leave all other references to
fitting unchanged.

---

Nitpick comments:
In `@tests/unittest/_torch/executor/test_py_scheduler.py`:
- Around line 2076-2100: The test test_max_utilization_peft_page_accumulation
demonstrates that PyCapacityScheduler under
CapacitySchedulerPolicy.MAX_UTILIZATION must accumulate PEFT page usage across
admitted requests; ensure MockPeftCacheManager(max_pages=25,
pages_per_request=10) behavior is used by PyCapacityScheduler.schedule_request
so that cumulative pages from admitted tasks are tracked and the third request
is rejected. Locate the test test_max_utilization_peft_page_accumulation and the
scheduling logic in PyCapacityScheduler.schedule_request and confirm the
scheduler queries peft_cache_manager (MockPeftCacheManager) for current
allocated pages and adds the candidate request's pages_per_request before
admitting; if accumulation is missing, update schedule_request to sum existing
allocated PEFT pages plus the new request's pages and enforce max_pages when
deciding to admit or reject. Ensure the test asserts len(fitting) == 2 still
passes after the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: dab3f1ba-2ceb-4816-b02e-c09f4ccd41b5

📥 Commits

Reviewing files that changed from the base of the PR and between bf96323 and 1668934.

📒 Files selected for processing (2)
  • tensorrt_llm/_torch/pyexecutor/scheduler/scheduler.py
  • tests/unittest/_torch/executor/test_py_scheduler.py

Comment thread tests/unittest/_torch/executor/test_py_scheduler.py Outdated
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
@achartier
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail -fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45948 Bot args parsing error: usage: /bot [-h]
{run,kill,skip,submit,reviewers,reuse-pipeline,reuse-review} ...
/bot: error: unrecognized arguments: -fast

Link to invocation

@achartier
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45950 [ run ] triggered by Bot. Commit: 52e94ef Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45950 [ run ] completed with state FAILURE. Commit: 52e94ef
/LLM/main/L0_MergeRequest_PR pipeline #36105 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants